Adaptive Audio-visual Speech Recognition in the Presence of Audio and Video Distortions

نویسندگان

Martin Heckmann

Frédéric Berthommier

Christophe Savariaux

Kristian Kroschel

چکیده

Audio-visual speech recognition leads to significant improvements compared to pure audio recognition especially when the audio signal is corrupted by noise. In this article we investigate the consequences of additional degradations in the video signal on the audio-visual recognition process.. We degrade the images with noise, a JPEG compression, and errors in the localization of the mouth region. The first question we address is how the noise in the video stream influences the recognition scores. Therefore we added noise to both, the audio and video signal at different SNR levels. The second question is how the adaptation of the fusion parameter, controlling the contribution of the audio and video stream to the recognition, is affected by the additional noise in the video stream. We compare the results we obtain when we adapt the fusion parameter to the noise in the audio and video stream to those we get when it is only adapted to the noise in the audio stream and hence a clean video stream is assumed. For the second type of tests we use an automatic adaptation of the fusion parameter based on the entropy of the a-posteriori probabilities from the audio stream.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion acros...

متن کامل

P1: Negative Television and Memory

According to reports about 30-thousand people spent watching television had the impact on their memory and recall that the results showed no differences between men and women. The people who watched less than an hour a day did better at every memory function. As these contributors watched negative political ads, physiological responses indicated that their body was reflexively preparing to move...

متن کامل

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach

Automatic speech recognition (ASR) on video data naturally has access to two modalities: audio and video. In previous work, audio-visual ASR, which leverages visual features to help ASR, has been explored on restricted domains of videos. This paper aims to extend this idea to open-domain videos, for example videos uploaded to YouTube. We achieve this by adopting a unified deep learning approach...

متن کامل

Adaptive Audio-Visual Speech Recognition with Distorted Audio and Video Data

Martin Heckmann , Frédéric Berthommier , Christophe Savariaux , Kristian Kroschel 3 1 Honda Research Institute Europe, 63073 Offenbach, Germany, Email [email protected] 2 Institut de la Communication Parlée (ICP), 38031 Grenoble, France, Email: {bertho, savario}@icp.inpg.fr 3 Institut für Nachrichtentechnik, Universität Karlsruhe, 76128 Karlsruhe, Germany, Email: [email protected]...

متن کامل

Characteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition

This paper focuses on combining audio-visual signals for Polish speech recognition in conditions of highly disturbed audio speech signal. Recognition of audio-visual speech was based on combined hidden Markov models (CHMM). Described methods where developed for a single isolated command, nevertheless their effectiveness indicated that they would also work similarly in continuous audio-visual sp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Adaptive Audio-visual Speech Recognition in the Presence of Audio and Video Distortions

نویسندگان

چکیده

منابع مشابه

Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier

P1: Negative Television and Memory

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach

Adaptive Audio-Visual Speech Recognition with Distorted Audio and Video Data

Characteristics of the Use of Coupled Hidden Markov Models for Audio-Visual Polish Speech Recognition

عنوان ژورنال:

اشتراک گذاری